Never before in history, have there been so many people on Earth as right now. The number boosted in the years, from around 1 billion in the year 1800, to 7.5 billions in 2017.
Estimates of the population amount at earlier times have been done too: at the time agriculture emerged in around 10.000 Before Christ, the world population ranged between 1 million and 15 million. Even earlier - about 70.000 years ago - studies supports that humans may have gone through bottleneck of 1000 - 10000 people according to the thory of the Toba supervulcanic eruption\(^{[1]}\).
Given the population growth of the last century, what should we expect for the next one? Will this lead to major changes in our lifestyle, or will this lead to wars, poverty problems, lack of primary resources and so on?
Or maybe all those are just unwarrant fears and everything is going to fix itself?
For this study I joined various dataset.
I started my analysis using some datasets I found on the World Bank Open Data at https://data.worldbank.org/, where I downloaded the collections of data regarding population amount, Birthrate and Deathrate (both normalized over 1.000 people) and the Fertility of a Country; those datasets contain values about the relative indicators from 1960 to 2016 for (quite) every Country in the world, and they show some missing data.
To analyze the situation in Italy also in earlier years (1700 - 1960) I added the data found here: https://www.populstat.info/Europe/italyc.htm.
To have an estimate of the world’s population from year one AD (annus domini), I took also data from here: https://www.ecology.com/population-estimates-year-2050/.
Finally, I used a bit also the dataset “Countries of the world” that you can find on Kaggle at https://www.kaggle.com/fernandol/countries-of-the-world, which contains some geographic charachteristics of the different countries, but of those I actually used only the Region (which I will better define later) and the Area.
To analyze the data I made use of different R packages: dplyr, leaflet, ggplot2, tidyr and stringr are the main names, but I used also geojsonio, rworldmap and countrycode to parse the data, leaflet to create some plots, and htmlwidget and htmltools to save and plot some interactive maps.
Let’s start now to explore a bit our datasets.
Early Ages
First of all, let’s have a look to the first stages of the human growth, from year 1 A.D. to 1800 A.D.
To generate this plot, I just had to read a simple three-columns table. I decided to use plotly, to have an interactive view of the data.
The table contains a lower estimate and the upper estimate of the world population in the years that you can read on the graph. As you can see, we started from year one with a mean value of 285 millions of individuals, number which then grew up to around 1.6 billions in 1900: in those ages, the world population starts to grow with a trend that seems quite exponential
World Bank Data
Let’s take a look to the datasets of World Bank Open Data.
When one decides to download an indicator from this site, ends up with three files:
Indicator Name and Code are quite useless for our scope, beacuse they are only a skimpy description of the table content. Country name and code are instead essential: each observation (row) of the table contains all the data values for one single Country, for the 1960 - 2017 time frame.
We can say that those data frames are tidy: the primary key is obviously present (the Country name, but also the Code can be used), each observations has its own row, and each variable is set in only one column, if we consider each year as a variable on its own. In some of my later use, however, the “Year” will become a variable, and I will have to use the gather function to modify my dataset.
The only thing I had to do to “tidy” my data, was to eliminate some rows or columns which contained only NA values.
An important thing to say, is that this dataset contains also some non-Country data, inserted as rows into the dataset: as an example, there is the row “World”. It was not necessary to eliminate them, but it was needed to be aware they were present.
I decided at this point to add to each Country the “Continent” and “Region” variables (the last one indicates the part of the Continent in which the Country is located), to chech the trend of the selected indicators as a dependence of those. To do this I had to modify the dataset adding some column by means of dlpyr and the mutate command.
This analysis done here above were made for all the dataset downloaded from the world bank, and can be found in the first part of the “world.R” script file.
Population Growth Trend
So, first big question: what is the trend of the world population growth in these last years? I decided to plot it, but to have a deeper understanding of the growth process, I had a look also at the Percentage of growth for every year, computed as:
\[ GrowthPercentage = \frac{P_{t}-P_{t-1}}{P_{t}} \cdot 100 \]
Where \(P_{t}\) represents the population in a certain year and \(P_{t-1}\) the population in the preceeding year.
So, the population is still growing, but the growth rate is falling down quickly, it seems: it went from around 2.0% to 1.0% within 57 years! This is a good news.
Population Distribution
To complete our understanding, let’s now have a look at the distribution of the people around the Continents and the different Regions in 2017. I made a check in different years, but the distribution is always quite the same, so I report here plot only for the year 2017.
Given the “total population” data, I created a new dataset containing the density of people in the different Countries, to show it then on a map: to do so I keep the “Area” measure from the “countries_world” dataset took from Kaggle and I computed the density as the number of people over the area. Then, I had to created a discrete scale of values to make a map work.
Let’s now have a look at the density map: here below you can see a leaflet representation of the distribution of people in the whole world.
For it, I had to perform other data arrangements because some Country’s names where not exact for leaflet: for example, instead of “United States” it was expected “United States of America”, so I had to check manually for a lot of Country names.
Now that we had a look to our data, let’s move on to the main questions, and let’s see what happened to the developing Countries in their past, to try to understand what is the direction of the World and their citizens.
What happens when a poor country starts to walks throught welfare and moves to an industrialized economic system? The birth rate, which is usually high in a poor country, will no longer be compensated by the high death rate, and the population starts to grow. Then at a certain point, the fear about overpopulation starts to rise.
In 1929 the American demographer Warren Thompson developed the theory of the Demographic Transition model\(^{[3]}\), which is based on the observation of changes - or transitions - from high birth and death rates to lower birth and death rates, during the industrial Growth of a Country.
This theory can involve four to five stages of transition of the trend of population growth. Here’s a summary of the five steps:
This is a model, consequently it is idealized and may not be accurate in every case, and so it remains to be seen if it can be applied to the less develop societies of today.
Let’s have a look at the situation in Italy, for example: in the graphs below you can see the trend of births, deaths, and the total population from 1960 to 2016.
To find out what Stage of the Demoghraphic Tramsition a country is in, we can have a look at the progress of death rate and birth rate for it, togheter with the total population trend, into a certain time frame (Wikipedia example).
Thanks to plotly, I ended up with those graphs:
To check better the situation I searched for a dataset of the total population which included also data for the previous years: here below it’s shown the total people’s trend, starting from 1700, where the green shaded area includes the same time portion of the Population graph here above.
Unfortunately, I didn’t find also the birthrate and deathrate data.
As we can see, the trend of birthrate it’s descending over the years, while the one for the deathrate remained almost the same. The total population in growing, but in the nearest years the growth has slowed down a bit beacause the birth rate is descreased. It looks like Italy is into stage three of the model, but looking to some online resources (for example https://italyfogartyl.weebly.com/people-and-population.html) we can read “taly is currently in Stage 4 of the Demographic Transition Model (DTM). Stage 4 is characterized as having low birth rates, low death rates, and a low Population Growth Rate (PGR), causing the population to stabilize”.
Let’s have a look also to the Population Growth Rate (PGR), then! PGR is defined like this:
\[ PGR = \frac{P(t_2) - P(t_1)}{P(t_1)(t_2 - t_1)} \]
The PGR measures the rate at which the number of individuals in a population increases in a given time period. A positive outcome of this qunatity indicates that the population is increasing, while a negative one indicates the decreasing of it. Moreover, a zero result means that the quantity has not changed in the selected amount of time.
Let’s compute it on the data used above here, using years difference as time intervals. Here you can see the outcome:
As we can see from the plot, the PGR values are always quite low from 1960 on, and in the last years (2015 - 2016) its values starts to become negative. This is a good proof that Italy’s population is diminishing, and so that Italy is currently standing into Stage Four of the Demographic Transiiton.
Let’s move to the point: we want to know if we can predict what will be the world population amount in the next few years, let’s say, until 2100.
To do so, I will try to make a prediction using the Logistic model for Population Growth\(^{[6]}\), which can be described by the Pearl-Reed logistic equation:
\[ \frac{dN}{dt} = rN(1-\frac{N}{K}) \]
This formula is used to describe the self-limitations of growth of a biological population, and was first published (in a different form) in 1838 by Verhulst, who was a belgian mathematician and a statistician. Pearl and Reed popularized the equation in the twentieth century.
In the equation, N represents the number of individuals at time t, r the intrinsic growth rate and K the maximum number of individuals that the environment can support. It can be integrated, obtaining:
\[ N(t) = \frac{K N_0 e^{-rt}}{K + N_0(e^{-rt}-1)} \] Where \(N_0\) is the starting number of individuals.
The main feature of the logistic model is that it takes the shape of a sigmoid curve and describes the growth of a population as an exponential followed by aa growth decrease, and bounded by the carrying capacity of the environment.
I used then the World Population data between 1960 and 2017 to try out this model and predict how much will the population be in 2100 (the code is inside the script prospects.R) ending with this graph:
[1] https://en.wikipedia.org/wiki/Toba_catastrophe_theory
[2] https://www.kaggle.com/fernandol/countries-of-the-world
[3] https://data.worldbank.org/
[4] https://www.ecology.com/population-estimates-year-2050/ (early ages)
[5] https://en.wikipedia.org/wiki/Demographic_transition
[6] https://en.wikipedia.org/wiki/Logistic_function#In_ecology:_modeling_population_growth
[7] https://en.wikipedia.org/wiki/Projections_of_population_growth
[8] http://www.clker.com/clipart-530947.html (clipart)